Paper No: 146 Adaptively Detecting Aggregation Bursts in Data Streams

نویسندگان

  • Shouke Qin
  • Weining Qian
  • Aoying Zhou
چکیده

Finding bursts in data streams is attracting much attention in research community due to its broad applications. Existing burst detection methods suffer the problems that 1) the parameters of window size and absolute burst threshold, which are hard to be determined a priori, should be given in advance. 2) Only one side bursts, i.e. either increasing or decreasing bursts, can be detected. 3) Bumps, which are changes of aggregation data caused by noises, are often reported as bursts. The disturbance of bumps causes much effort in subsequent exploration of mining results. In this paper, a general burst model is introduced for overcoming above three problems. We develop an efficient algorithm for detecting adaptive aggregation bursts in a data stream given a burst ratio. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that bursts on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method is relatively good, while experimental results depict the applicability and efficiency of our algorithm in different application settings. – Key Word 1: Data Streams – Key Word 2: Data Mining and Knowledge Discovery – Add Keyword: Burst Detection – Author List: Shouke Qin, Weining Qian and Aoying Zhou Adaptively Detecting Aggregation Bursts in

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Burst Detection for Non-Stationary Stream Data

Detecting bursts in data streams is an important and challenging task, especially in stock market, traffic control or sensor network streams. Burst detection means the identification of non regular behavior within data streams. A specifically crucial challenge on burst detection is to identify bursts in the case of non-stationary data. One approach is to apply thresholds to discover such bursts...

متن کامل

Indexing and Querying Data Streams

Online monitoring of data streams poses a challenge in many data-centric applications including network traffic management, trend analysis, web-click streams, intrusion detection, and sensor networks. Indexing techniques used in these applications have to be time and space efficient while providing a high quality of answers to user queries: (I) queries that monitor aggregates, such as finding s...

متن کامل

Detecting Time Correlations in Time-Series Data Streams

In this paper, a novel method for analyzing time-series data and extracting time-correlations among multiple time-series data streams is described. The time-correlations tell us the relationships and dependencies among time-series data streams. Reusable time-correlation rules can be fed into various analysis tools, such as forecasting or simulation tools, for further analysis. Statistical techn...

متن کامل

An introduction to Stream Data Management on Large Information Networks

In recent times there has been a surge of large scale information networks arising in various application domains, ranging from communication networks, cellphone call networks, social networks, email networks, road traffic networks, financial transaction networks, to name a few. In such applications there is a need to manage and process large data streams in near-real time. Examples of such que...

متن کامل

ERCIM “ Alain Bensoussan ” Fellowship Scientific Report

The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a data set, through message passing among all pairs of data items. However, it suffers two major shortcomings: i) the number of clusters is vague with the user-defined parameter called self-confidence, and ii) the quadratic computational complexity. When aim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004